{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2db56c9b",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/vector_stores/LanceDBIndexDemo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "db0855d0",
   "metadata": {},
   "source": [
    "# LanceDB Vector Store\n",
    "In this notebook we are going to show how to use [LanceDB](https://www.lancedb.com) to perform vector searches in LlamaIndex"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f44170b2",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "088fe753",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c2d1c538",
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "import sys\n",
    "\n",
    "# Uncomment to see debug logs\n",
    "# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)\n",
    "# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))\n",
    "\n",
    "from llama_index import SimpleDirectoryReader, Document, StorageContext\n",
    "from llama_index.indices.vector_store import VectorStoreIndex\n",
    "from llama_index.vector_stores import LanceDBVectorStore\n",
    "import textwrap"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "26c71b6d",
   "metadata": {},
   "source": [
    "### Setup OpenAI\n",
    "The first step is to configure the openai key. It will be used to created embeddings for the documents loaded into the index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "67b86621",
   "metadata": {},
   "outputs": [],
   "source": [
    "import openai\n",
    "\n",
    "openai.api_key = \"\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "073f0a68",
   "metadata": {},
   "source": [
    "Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eef1b911",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f7010b1d-d1bb-4f08-9309-a328bb4ea396",
   "metadata": {},
   "source": [
    "### Loading documents\n",
    "Load the documents stored in the `data/paul_graham/` using the SimpleDirectoryReader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c154dd4b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Document ID: 855fe1d1-1c1a-4fbe-82ba-6bea663a5920 Document Hash: 4c702b4df575421e1d1af4b1fd50511b226e0c9863dbfffeccb8b689b8448f35\n"
     ]
    }
   ],
   "source": [
    "documents = SimpleDirectoryReader(\"./data/paul_graham/\").load_data()\n",
    "print(\"Document ID:\", documents[0].doc_id, \"Document Hash:\", documents[0].hash)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c0232fd1",
   "metadata": {},
   "source": [
    "### Create the index\n",
    "Here we create an index backed by LanceDB using the documents loaded previously. LanceDBVectorStore takes a few arguments.\n",
    "- uri (str, required): Location where LanceDB will store its files.\n",
    "- table_name (str, optional): The table name where the embeddings will be stored. Defaults to \"vectors\".\n",
    "- nprobes (int, optional): The number of probes used. A higher number makes search more accurate but also slower. Defaults to 20.\n",
    "- refine_factor: (int, optional): Refine the results by reading extra elements and re-ranking them in memory. Defaults to None\n",
    "\n",
    "- More details can be found at the [LanceDB docs](https://lancedb.github.io/lancedb/ann_indexes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8731da62",
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_store = LanceDBVectorStore(uri=\"/tmp/lancedb\")\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, storage_context=storage_context\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8ee4473a-094f-4d0a-a825-e1213db07240",
   "metadata": {},
   "source": [
    "### Query the index\n",
    "We can now ask questions using our index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0a2bcc07",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"How much did Viaweb charge per month?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8cf55bf7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Viaweb charged $100 per month for a small store and $300 per month for a big one.\n"
     ]
    }
   ],
   "source": [
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "68cbd239-880e-41a3-98d8-dbb3fab55431",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = query_engine.query(\"What did the author do growing up?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fdf5287f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author worked on writing and programming outside of school before college. They wrote short\n",
      "stories and tried writing programs on the IBM 1401 computer. They also mentioned getting a\n",
      "microcomputer, a TRS-80, and started programming on it.\n"
     ]
    }
   ],
   "source": [
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "6afc84ac",
   "metadata": {},
   "source": [
    "### Appending data\n",
    "You can also add data to an existing index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "069fc099",
   "metadata": {},
   "outputs": [],
   "source": [
    "del index\n",
    "\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    [Document(text=\"The sky is purple in Portland, Maine\")],\n",
    "    uri=\"/tmp/new_dataset\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5cffcfe",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The sky is purple in Portland, Maine.\n"
     ]
    }
   ],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"Where is the sky purple?\")\n",
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dc99404d",
   "metadata": {},
   "outputs": [],
   "source": [
    "index = VectorStoreIndex.from_documents(documents, uri=\"/tmp/new_dataset\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "676214a2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author started two companies: Viaweb and Y Combinator.\n"
     ]
    }
   ],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"What companies did the author start?\")\n",
    "print(textwrap.fill(str(response), 100))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
